Statistical Analysis of the COVID-19 Mortality Rates with Probability Distributions: The Case of Pakistan and Afghanistan

The COVID-19 pandemic has shocked nations due to its exponential death rates in various countries. According to the United Nations (UN), in Russia, there were 895, in Mexico 303, in Indonesia 77, in Ukraine 317, and in Romania 252, and in Pakistan, 54 new deaths were recorded on the 5th of October 2021 in the period of months. Hence, it is essential to study the future waves of this virus so that some preventive measures can be adopted. In statistics, under uncertainty, there is a possibility to use probability models that leads to defining future pattern of deaths caused by COVID-19. Based on probability models, many research studies have been conducted to model the future trend of a particular disease and explore the effect of possible treatments (as in the case of coronavirus, the effect of Pfizer, Sinopharm, CanSino, Sinovac, and Sputnik) towards a specific disease. In this paper, varieties of probability models have been applied to model the COVID-19 death rate more effectively than the other models. Among others, exponentiated flexible exponential Weibull (EFEW) distribution is pointed out as the best fitted model. Various statistical properties have been presented in addition to real-life applications by using the total deaths of the COVID-19 outbreak (in millions) in Pakistan and Afghanistan. It has been verified that EFEW leads to a better decision rather than other existing lifetime models, including FEW, W, EW, E, AIFW, and GAPW distributions.


Introduction
The first case of COVID-19 infection was located in Pakistan on February 26, 2020, in Karachi-a recent returnee from Iran. From that point onward, the spread of contaminations sped up, and on March 18, 2020, it was affirmed that the infection had spread to all regions of Pakistan. More than a hundred deaths apart from more than six thousand infected people were reported in the first seven weeks of this outbreak [1]. Pakistan has the third-highest number of cases in South Asia after India and Bangladesh, while it stands 7th in Asia as of September 16, 2021, with a 26th position world-wide. The first death was reported on March 20 in Sindh province, and the community transmission was spread rapidly all over the country.
In a country like Pakistan, the graph started to follow an upward trajectory in March 2020 and peaked in June when it slowly started to decline and flattened in August and September. But again, it started to increase in October of the same year, reflecting the bathtub shape in the data. Figures 1(a) and 1(b) show the average infection rates.
Many researchers have conducted various studies to investigate the COVID-19 outbreak, such as Singh et al. who explored how to predict the COVID-19 pandemic for the top 15 countries using the ARIMA model [2]. The worldwide death rates were estimated by Chaurasia and Pal, by employing the ARIMA and regression models [3]. Chakraborty and Ghosh utilized a regression tree and ARIMA model to forecast the short time of COVID-19 cases in multiple countries and the risk of COVID-19 by finding various demographic characteristics beside some disease characteristics within these countries [4]. Yousaf et al. [5] utilized the autoregressive integrated moving average (ARIMA) model to predict infections, deaths, and recoveries. Fong et al. [6] considered small data for early forecasting, while Petropoulos and Makridakis [7] also applied the forecasting model. Chen et al. [8] designed an algorithm for predicting COVID-19 data, while Nayak et al. [ [11] with the help of surveillance systems, and a similar study to estimate the final size of the COVID-19 epidemic has also been discussed by Syed and Sibgatullah [12]. Mizumoto et al. [13] estimated the asymptomatic proportion of COVID-19. Many researchers applied various statistical models to predict data analysis. For example, Sukhanova et al. [14] forecast the macroeconomic indices with the help of ARIMA, vector autoregression (VAR), and simultaneous equation system. Yu et al. [15] predict the tourism demand by utilizing the SARIMA model and neural network (NN). To examine the accuracy with which long-term scenarios can be predicted in patients with coronary artery disease, Lee et al. [16] applied Cox regression. The results showed that modelbased prediction was considered better as compared to doctors' prediction.
Many lifetime distributions are available in the literature to predict the COVID-19 data, but these distributions are unable to model the data more precisely. For example, the Weibull (W) distribution introduced by Weibull [17] and the exponential (Ex) distribution by Epstein [18] along with other lifetime distributions are unable to model the COVID-19 data or any other data related to any infections of the disease that does not follow a constant rate (monotonic data). In daily life situations, the data does not always follow a monotonic failure function; rather, it follows a nonmonotonic failure function. For example, patients with tuberculosis have a higher risk in the early stages but a lower risk later on. A similar form of nonmonotonicity occurs in infants because the hazards for infants are highest in the early stages and gradually reduce as they develop, but the danger increases again as they become older, resulting in the bathtub shape. The researchers are trying to introduce functions that are more flexible as well as and can capture the nonmonotonic hazard rate functions. For example, Cordeiro et al. [19], El-Gohary et al. [20], Ijaz et al. [21], and Farooq et al. [22] worked on introducing the new distributions. We recommend recent research studies: Ijaz et al. [21,23] and Ijaz et al. [24].
In practice, the modeling of real phonon becomes more complex when the number of unknown's parameters is large. There are two main significant advantages of the probability models in this paper. First, it presents a best fitted model which is more flexible with fewer unknown parameters. Secondly, it leads us to better results for various hazard rate shapes, particularly in a bathtub shape where the curves are flatted at the middle and skewed on either side. Note that the distribution in this paper may not be considered as a best fitted model for the data sets with extreme values or even when there is an outlier.

Material and Methodology
The current research study focuses on the best fitted probability model which has more parameters as compared to some existing models. In this paper, the best fitted model has increased a shape parameter (d) in the family of distributions introduced by [22]. The CDF and PDF of the proposed probability model take the following forms: By putting the CDF and PDF of the Weibull distribution, Equations (1) and (2) take the following form: where "b" is the scale and "c" and "d" are the shape parameters. Figure 2 defines the shapes of the CDF and PDF described in (3) and (4), respectively.

The Survival
SðxÞ and Hazard hðxÞ Rate Function. By definition, SðxÞ and hðxÞ functions are, respectively, defined by

Computational and Mathematical Methods in Medicine
Using (3) and (4), we get Figure 3 defines various shapes of the hazard rate function.

Statistical Properties
3.1. Quantile Function. The quantile function is defined by Using (3), we get The final result for X can be obtained as where q ∼ U½0, 1.

rth Moment. The rth moment can be obtained by
Using z = e að1−e −bx c Þ , then dz = abcx c−1 e að1−e −bx c Þ−bx c dx and finally, we obtained 3.3. Order Statistics. The i th order statistic of the PDF is given by Letting Equations (3) and (4), the 1 st and n th order statistics of EFEW can be obtained, respectively, by using i = 1 and i = n as where α ln ð1−e −by Þ log ðαÞβ/ð1 − α ln ð1−e −by Þ Þðe by − 1Þ describes quartile values. Table 1 clearly shows that EFEW can model the normal, positively skewed data, or even the data skewed to the left.

Special Cases
The special cases of EFEW are as follows.
Case 1. When d = 1. By putting d = 1 in (3) and (4), we derive the CDF and PDF of the flexible exponential Weibull (FEW) distribution. The mathematical form is described as Case 2. When d = 1 and c = 1. Putting d = 1and c = 1 in (3) and (4) shall refer to the CDF and PDF of the gull alpha power exponential distribution (GAPE). The mathematical form is described as Case 3. When d = 1 and c = 2. If we replace d = 1 and c = 2 in (3) and (4), the CDF and PDF will become NF Rayleigh (NFPR) distribution. Mathematically, the CDF and PDF of NFPR are

Parameter Estimation
The log likelihood function of Equation (4) is defined by The partial derivatives of (19) with respect to parameters are obtained by Computational and Mathematical Methods in Medicine The above expressions are not in closed form, but still, the numerical solution is possible by using various mathematical techniques.

Applications
In this section, the COVID-19 death data of Pakistan and Afghanistan were considered to delineate the real-life applications by means of AIC, CAIC, BIC, and HQIC.
It should be noted that the model with a fewer value of these criteria is considered as the best model among others.
The data sets with the URL https://github.com/owid/ covid-19-data are taken from May 2, 2020, till July 4, 2021, for Pakistan and Afghanistan. Tables 2 and 3 respectively defines the mortality rates in Pakistan and Afghanistan.
In Figure 4, both the theoretical and empirical graphs depict that the EFEW is the best fitted line as compared to other existing distributions and can be justified from Tables 4 and 5. Figure 5 demonstrates the Q-Q and P-P plot of the COVID-19 death data. The Q-Q plot demonstrates that most of the data points, except a few points on the upper tail, follow a linear pattern on the line, while the P-P plot also indicates a reasonably good fit and indicates that the EFEW reasonably describes the empirical data distribution along with empirical and theoretical densities and their CDF. Figure 6 depicts the pattern of the hazard rate function. The curve clearly crosses the diagonal line, and hence, the data follows a nonmonotonic hazard rate function. Table 4 shows the Cramer-Mises (W) and Anderson-Darling (A) maximum likelihood estimates, standard errors, and log-likelihood values. Table 5 shows the best model selection criterion. The results of Tables 5 and 6 depict the smaller values for FEW among others using this goodness of fit criteria and hence show that EFEW provides a flexible fit over exponential (E), Weibull (W), Exponential-Weibull (Ex-W), Algoharai inverse flexible Weibull (AIFW), and gull alpha power Weibull (GAPW) distributions. Figure 7 shows the theoretical and empirical PDF and CDF of EFEW distribution using the COVID-19 death data from Afghanistan. Both the theoretical and empirical graphs clearly depict that the EFEW is the best fitted line as compared to other existing distributions and can be justified from the numerical values presented in Tables 6 and 7. Figure 8 demonstrates the Q-Q and P-P plot of the COVID-19 death data from Afghanistan. The Q-Q plot demonstrates that most of the data points, except a few points on the upper tail, follow a linear pattern on the line, while the P-P plot also indicates a reasonably good fit and indicates that the EFEW reasonably describes the empirical data distribution along with empirical and theoretical densities and their CDF. Figure 9 follows the same pattern as Figure 6 which means that the death rate in Afghanistan also follows a nonmonotonic shape.
The results of Tables 6 and 7 show that by employing these criteria, smaller values are achieved for EFEW, and hence, EFEW gives a flexible fit over FEW, E, W, Ex-W, AIFW, and GAPW.

Simulation Study of EFEW Distribution
A simulation study has been performed to check the consistency of the parameters of the EFEW distribution. We consider two set of parameter values, i.e., a = 0:5, b = 0:05, c = 1:5, and d = 0:5 and a = 0:6, b = 0:05, c = 1:77, and d = 0:6. A simulation is performed with 1000 replications. A sample of sizes n = 40, 70, 100, 150 and n = 100, 200, 300, 400 are drawn, respectively, and the bias and mean square error (MSE) are estimated. The mathematical forms are described as ð21Þ Table 8 defines the average mean square errors and biases of each parameter using small and large sample sizes taken from EFEW. It is quantified that when we increase the sample of size n, the average values of mean square errors and bias decrease with different values of parameters.

Conclusion
In this article, the best fitted model (EFEW) is pointed out for modeling the death rates of coronavirus. Various statistical properties of the proposed model have been discussed. The significance of EFEW has been evaluated using the death data of COVID-19 in Pakistan and Afghanistan. It has been verified that the EFEW model is capable of modeling both the monotonic and nonmonotonic failure data better than the existing models. Moreover, the findings consistently lead to better results and increase the model flexibility compared to the existing probability distributions. Hence, the inclusion of the parameter (d) to the existing model plays an important role and hence is a better choice in making predictions of deaths among infected patients of coronavirus than the other models.
It is expected that the present class of expressions, along with its special forms, will attract the researchers towards its contribution to other applied research areas such as engineering, hydrology, agriculture, economics, survival analysis, and various others. Moreover, the present study can be extended to neutrosophic statistics. A future research study may also be conducted on the Bayesian analysis of the model parameters under various loss functions.

Data Availability
The simulated data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.